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Abstract 



We show that using nonextensive entropy can lead to spontaneous 
symmetry breaking when a parameter changes its value from that appli- 
cable for a symmetric domain, as in field theory. We give the physical 
reasons and also show that even for symmetric Dirichlet priors, such a 
definition of the entropy and the parameter value can lead to asymmetry 
when entropy is maximized. 

1 Introduction 

Nonextensive entropies, such as that defined by TsaUisjT], or more recently 
by us[2], among others [31 13] differ from the conventional Boltzmann-Shannon 
form, which is extensive in the sense of being additive when two subsystems in 
equilibrium are joined together. In nonextensive forms the combined value of 
entropy may be, in general, higher or lower than the sum of the entropies for 
the subunits joined. The deviation is , therefore, ascribable to interactions of a 
nonrandom nature among the microsystems comprising each subunit. 

The maximum value of extensive entropy occurs when the probabilities are 
equally distributed among all the possible states of the system. In other words 
the conventional entropy is maximal for the most symmetric distribution of the 
microsystems. In the present paper we show that for nonextensive entropies 
defined on terms of phase cell deformations, the maximal entropy may not 
correspond to an equidistribution of probability among the states. 

2 Nonextensive Entropy 

The classical entropy 
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may be modified in several ways. The well-known Tsallis form generalizes the 
logarithm: 
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For our entropy we make the measure a fractal: 
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S = -^pf logp, 
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with q ^ 1 giving the classical Shannon entropy, as in the Tsallis case. In ref. [2] 
we have given detailed account of the different physical considerations that lead 
to our expression and also comparison of the statistical mechanical properties 
of the three entropies. 

The justification of choosing Shannon or any other more generalized entropy, 
such as that of Tsallis or Renyi, or the one we have presented else where [3J [3], 
lies eventually in the relevance or "good fit" such an entropy would produce in 
the data corresponding to a situation where the presence or lack of interactions 
among the members or other considerations suggest the need for a proper choice. 
However, data are always finite, and probability distribution is the limit of 
relative frequencies with an infinite sample. One, therefore faces the problem of 
estimating the best PDF from a finite sample {7^. This PDF may be subject to 
the constraint of a known entropy, in whatever way defined, as a functional of 



Mathematically, the problem of determining the best posterior PDF, given 
a rough prior PDF and data points, is expressed formally by Bayes Theorem. 
However, the constraint of the constant entropy makes the functional integral 
impossible to handle even for a fairly simple prior as found by Wolpert and Wolf 
[S] and by Nemenman, Shafee and Bialek[5]. The integrals involved were first 
considered in a general context by [S^ , and the question of priors was addressed 
in [71 [S]. It was discovered that, though the integral for the posterior was 
intractable, the moments of the entropy could be calculated with relative ease. 

In [5] it has also been shown that for Dirichlet type priors [9] 



in particular (which give nice analytic moments with exact integrals, and hence, 
are hard to ignore) the Shannon entropy is fixed by the exponent /? of the 
probabilities chosen for small data samples, and hence, not much information 
is obtained for unusual distributions, such as that of Zipf, i.e. a prior has to 
be wisely guessed for any meaningful outcome. As a discrete set of bins has 
no metric, or even useful topology that can be made use of in Occam razor 
type of smoothing, in this paper other tricks were suggested to overcome the 
insensitiveness of the entropy. 

We have noted already that the PDF associated with our proposed entropy 
differs from that of the Shannon entropy by only a power of pi , but this changes 
the symmetry of the integrations for the moments for the different terms for 



the PDF. 



(4) 
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different bins. We,therefore, shall examine in this chapter if the nature of the 
moments are sufficiently changed by our entropy to indicate cases where data 
can pick this entropy in preference to Shannon or other entropies. 



3 Priors and Moments of Entropy 

For completeness, we mention here the formalism developed by Wolpert and 
Wolf [8]. The uniform PDF is given by 



Vnnif{{p^}) „ 
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where the S function is for normalization of probabilities, .^unif is the total 
volume occupied by all models. The integration domain V is bounded by each 
Pi in the range [0, 1]. Because of the normalization constraint, any specific set of 
{pi} chosen from this distribution is not uniformly distributed and "uniformity" 
means simply that all distributions that obey the normalization constraint are 
equally likely a priori. 

We can find the probability of the model {pi} with Bayes rule as 

"unif({n-i}) 
K 

P{{n.}\{p.})^l[(P^r- (6) 

Generalizing these ideas, we have considered priors with a power-law depen- 
dence on the probabilities calculated as 



It has been shown ^ that if pi 's are generated in sequence [ i = 1 ^ K] from 
the Beta-distribution 

gives the probabiUty of the whole sequence {pi} as VpiiPi})- 
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Random simulation of PDF's with different shapes (a few bins occupied, 
versus more spread out ones) show that the entropy depends largely on the 
parameter (3 of the prior and hence, sparse data has virtually no role in get- 
ting the output distribution shape. This would seem unsatisfactory, and some 
adjustments appear to be needed to get any useful information out. 

We shall not here repeat the methods and results of [S] , which considers only 
Shannon entropy. 



4 Comparison of Shannon and Our Entropy 

In our case with the entropy function given by Eqn. [H we note that it does not 
involve a simple replacement of the exponents ni of Pi hy Ui + q— I in the case of 
the Dirichlet prior (Eqn. 3]) in the product involved in the moment determination 
integrals given in [S], but a complete re-calculation of the moment, using the 
same techniques given in [S] . Apparently, the maximal value of entropy should 
correspond to the most flat distribution, i.e. 

Sma. = A'(i-'?) log(i^) (9) 

In the limit of extremely sparse, nearly zero data {ui = 0), we get for the 
first moment, i.e. the expected entropy, 

iS.)/iSo) = ^^f^?^A$0(^^ ^ ^ ^ ^) (10) 

where we have for conciseness used the notation of ref. [8] 

A$P(a, b) = ¥p^^'> (a) - (6) (11) 

\E'"(a;) being the polygamma function of order n of the argument x. It can be 
checked easily that this expression reduces to that in ref. [5] when q = 1, i.e. 
when we use Shannon entropy. 



5 Results for Mean Entropy 

So, we now have, unlike Shannon, a parameter q that may produce the difference 
from the Shannon case, where q is fixed at unity. In Figs. [1] - [3] we show the 
variation of the ratio of (Si) / Smax with variable bin number K. In ref. [5] we 
have commented how insensitive the Dirichlet prior [5^ is when Shannon entropy 
is considered in the straightforward manner given in ref. [8^ . In our generalized 
form of the entropy, we note that by changing the parameter, specific to our 
form of the entropy, for q > 1, we get a peak for small f3 and large K values. 

This peak allows us to choose uniform Dirichlet priors with appropriate q 
value, that would nevertheless lead to asymmetry not possible with Shannon 
entropy. In other words, instead of the priors, we can feed the information 
about expected asymmetry of the PDF to the entropy with no need to choose 
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Figure 1: Ratio of expected value (first moment) of new entropy plotted against 
bin number K and prior exponent (3 for entropy parameter q = 0.5. 




Figure 2: Same as Fig. [1] but for q = 1.0, i.e. Shannon entropy. 

particular bins. The nonextensivity of our entropy, coming possibly from inter- 
action among the units, gives rise to situations where the entropy maxima do 
not increase with the number of bins like \og{K)^ but being /sr(i~'?) \og{^K)^ may 
be extended or squeezed, according to the value of q being less than or greater 
than unity. 
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Figure 3: As previous two figures, but for q — 1.5. 

6 Spontaneous Symmetry Breaking 

The interesting thing to note is that for q > 1 and large at smah prioric 
parameter (3, the entropy peak exceeds the normally expected expression in 
Eqn. [51 with full so, the expected value of entropy is seen to exceed the 
formal maximum. The clustering or repulsive effects, change the measure of 
disorder from the Shannon type entropy. So, the highest expected value of 
entropy may correspond not to a uniformly distributed population, but to that 
corresponding to one with a smaller subset that is populated. This means that 
for our entropy the most uniform distribution is not the least informative, the 
weighting distorts it to an uneven distribution for the expected maximal entropy 
value. This result is in some ways similar to spontaneous symmetry breaking 
in field theory, where the variation of a parameter leads to broken-symmetry 
energy minima. 

A neater view of these results can be seen in Figs. H]- [HI with K values fixed. 

We have not obtained the second moment, i.e. the standard deviation , 
or the spread, of the entropy distribution, because, with our entropy and an 
arbitrary g, the expressions cannot be obtained in the simple form of ref. [5]. 
We, can however, expect that the variation of the higher moments from the 
Shannon case will be less than the first moment, because higher derivatives of 
the r functions are smoother. We shall assume the spreads are narrow enough 
to concentrate on the first moments only. 

Apart from the PDF estimates above, this picture of broken symmetry for 
the maximal entropy when the parameter g > 1 is also manifest directly in an 
explicit calculation of the entropy using our prescription with a simple three- 
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Figure 4: Clearer view in 2-dinicnsional plot, with K — 10. Red, green and blue 
lines are for q = 0.5, 1.0 and 1.5 respectively 

S' 




Figure 5: As Fig. [H but for bin number K = 100. 

state system. The symmetric expected maximal entropy in this case should 
be 

5ma. =-3p«l0gp (12) 

with p = 1/3. 

With two of the probabilities pi and p2 running free from to 1 with the 
constraint pi + P2 + P3 = 1 , the plot for the entropy 

S = -p\ logpi - pI \ogp2 - (1 - - P2) log(l - Pi - P2) (13) 

we plot S/ Smax in Figs. [3 [H For q — 2.44 we obtain the most interesting 
behavior, with a local maximum at the point of symmetry pi ^ P2 = Ps — 1/3, 
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Figure 6: As previous two figures, but for K — 1000. 

which is not the global maximum. For q < 1 the symmetry point gives the 
global maximum. 




Figure 7: Our entropy for a three-state system, with parameter q = 2.44, as two 
independent probabilities pi andp2 are varied wit the constraint pi -|-p2 +P3 = 1- 
The expected maximum at the symmetry point pi — P2 ~ Ps turns out to be 
a local maximum. The global maxima are not at the end points with one of 
the probabilities going up to unity and the others vanishing, which gives zero 
entropy as expected, but occurs near such end points, as shown clearly in the 
next figure. 

A physical explanation of the SSB may be the distortion introduced by 
nonrandom interactions in the volumes of the 'phase space' of the states. In 



8 




Figure 8: Two-dimensional version of the previous Fig. [71 with pi = 1/3 fixed, 
so that only p2 varies. This shows a clearer picture of local maximum at the 
symmetry point and global maxima near the end points. 

ref. [2] we have shown how the new entropy is related to such volumes, in terms 
of Shannon coding theorem. Apparently this distortion introduces a mixing of 
states that reduces the weights of clearly defined states and hence introduces 
a new measure of uncertainty not present in the case of Shannon entropy. As 
a result it is entropically preferable to leave some states underpopulated to 
increase the total entropy by overpopulating others. In other words we have a 
reduction of the problem from N states to less, but with a measure factor with 
less dimunition that overcompensates the decrease in the logarithmic factor. In 
a field-theoretic model with the Lagrangain 

= Vl<M' + A|0|^ (14) 

for ij? and A with opposite signs, the lowest energy state is the symmetric 
vacuum (no state occupied), and for same sign the vacuum becomes a local 
maximum, with a ring of minima at |(/)| = |/i/y(2A)|, which forces us to choose 
a unique vacuum with a particular complex (f) having this magnitude. In the 
case of entropy, for g < 1 we have the highest entropy for all states equally 
populated, and for g > 1 the configuration with a symmetric flat PDF is no 
longer the one with the highest entropy. 

7 Conclusions 

We have seen that average entropies corresponding to uniformly symmetric 
Dirichlet type priors can be obtained exactly even for nonextensive entropies 
of a type we have described earlier. Remarkably this entropy shows maxima for 
asymmetric probability distributions, which can be considerably higher than the 



9 



symmetric distribution, unlike Shannon entropy. We think this asymmetry is 
a consequence of the distortion of the 'phase cells' associated with the states, 
which may in turn be due to nonrandom interactions. 

The author thanks Prof. Phil Broadbridge of the Australian Mathematical 
Sciences Institute for encouragement. 
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